13 research outputs found
AdaBiM: An adaptive proximal gradient method for structured convex bilevel optimization
Bilevel optimization is a comprehensive framework that bridges single- and
multi-objective optimization. It encompasses many general formulations,
including, but not limited to, standard nonlinear programs. This work
demonstrates how elementary proximal gradient iterations can be used to solve a
wide class of convex bilevel optimization problems without involving
subroutines. Compared to and improving upon existing methods, ours (1) can
handle a wider class of problems, including nonsmooth terms in the upper and
lower level problems, (2) does not require strong convexity or global Lipschitz
gradient continuity assumptions, and (3) provides a systematic adaptive
stepsize selection strategy, allowing for the use of large stepsizes while
being insensitive to the choice of parameters
Bregman Finito/MISO for nonconvex regularized finite sum minimization without Lipschitz gradient continuity
We introduce two algorithms for nonconvex regularized finite sum
minimization, where typical Lipschitz differentiability assumptions are relaxed
to the notion of relative smoothness. The first one is a Bregman extension of
Finito/MISO, studied for fully nonconvex problems when the sampling is random,
or under convexity of the nonsmooth term when it is essentially cyclic. The
second algorithm is a low-memory variant, in the spirit of SVRG and SARAH, that
also allows for fully nonconvex formulations. Our analysis is made remarkably
simple by employing a Bregman Moreau envelope as Lyapunov function. In the
randomized case, linear convergence is established when the cost function is
strongly convex, yet with no convexity requirements on the individual functions
in the sum. For the essentially cyclic and low-memory variants, global and
linear convergence results are established when the cost function satisfies the
Kurdyka-\L ojasiewicz property
Adaptive proximal algorithms for convex optimization under local Lipschitz continuity of the gradient
Backtracking linesearch is the de facto approach for minimizing continuously
differentiable functions with locally Lipschitz gradient. In recent years, it
has been shown that in the convex setting it is possible to avoid linesearch
altogether, and to allow the stepsize to adapt based on a local smoothness
estimate without any backtracks or evaluations of the function value. In this
work we propose an adaptive proximal gradient method, adaPG, that uses novel
estimates of the local smoothness modulus which leads to less conservative
stepsize updates and that can additionally cope with nonsmooth terms. This idea
is extended to the primal-dual setting where an adaptive three term primal-dual
algorithm, adaPD, is proposed which can be viewed as an extension of the PDHG
method. Moreover, in this setting the ``essentially'' fully adaptive variant
adaPD is proposed that avoids evaluating the linear operator norm by
invoking a backtracking procedure, that, remarkably, does not require extra
gradient evaluations. Numerical simulations demonstrate the effectiveness of
the proposed algorithms compared to the state of the art
Distributed proximal algorithms for large-scale structured optimization
Efficient first-order algorithms for large-scale distributed optimization
is the main subject of investigation in this thesis.
The algorithms considered cover a wide array of applications
in machine learning, signal processing and control.
In recent years, a large number of algorithms have been introduced
that rely on (possibly a reformulation of) one of the
classical splitting algorithms, specifically forward-backward,
Douglas-Rachford and forward-backward-forward splittings.
In this thesis a new three term splitting technique is developed
that recovers forward-backward and Douglas-Rachford
splittings as special cases. In the context of structured optimization,
this splitting is leveraged to develop a framework
for a large class of primal-dual algorithms providing a unified
convergence analysis for many seemingly unrelated algorithms.
Moreover, linear convergence is established for all
such algorithms under mild regularity conditions for the cost
functions.
As another notable contribution we propose a randomized
block-coordinate primal-dual algorithm that leads to a fully
distributed asynchronous algorithm in a multi-agent model.
Moreover, when specializing to multi-agent structured optimization
over graphs, novel algorithms are proposed. In addition,
it is shown that in a multi-agent model bounded communication
delays are tolerated by primal-dual algorithms
provided that certain strong convexity assumptions hold.
In the final chapter we depart from convex analysis and consider
a fully nonconvex block-coordinate proximal gradient
algorithm and show that it leads to nonconvex incremental
aggregated algorithms for regularized finite sum and sharing
problems with very general sampling strategies
Asymmetric forward-backward-adjoint splitting for solving monotone inclusions involving three operators
© 2017, Springer Science+Business Media New York. In this work we propose a new splitting technique, namely Asymmetric Forward–Backward–Adjoint splitting, for solving monotone inclusions involving three terms, a maximally monotone, a cocoercive and a bounded linear operator. Our scheme can not be recovered from existing operator splitting methods, while classical methods like Douglas–Rachford and Forward–Backward splitting are special cases of the new algorithm. Asymmetric preconditioning is the main feature of Asymmetric Forward–Backward–Adjoint splitting, that allows us to unify, extend and shed light on the connections between many seemingly unrelated primal-dual algorithms for solving structured convex optimization problems proposed in recent years. One important special case leads to a Douglas–Rachford type scheme that includes a third cocoercive operator.status: publishe
A New Randomized Block-Coordinate Primal-Dual Proximal Algorithm for Distributed Optimization
This paper proposes TriPD, a new primal-dual algorithm for minimizing the sum of a Lipschitz-differentiable convex function and two possibly nonsmooth convex functions, one of which is composed with a linear mapping. We devise a randomized block-coordinate version of the algorithm which converges under the same stepsize conditions as the full algorithm. It is shown that both the original as well as the block-coordinate scheme feature linear convergence rate when the functions involved are either piecewise linear-quadratic, or when they satisfy a certain quadratic growth condition (which is weaker than strong convexity). Moreover, we apply the developed algorithms to the problem of multi-agent optimization on a graph, thus obtaining novel synchronous and asynchronous distributed methods. The proposed algorithms are fully distributed in the sense that the updates and the stepsizes of each agent only depend on local information. In fact, no prior global coordination is required. Finally, we showcase an application of our algorithm in distributed formation control.status: publishe